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ABSTRACT 

The 50 papers on educational evaluation presented at 
the 1972 AERA Conference are reviewed. The papers were classified 
into four categories: Theory and Methodology, Empirical Models, 
Empirical studies, and Nationwide Evaluation and State Assessment. A 
list of the papers reviewed, their authors, and, when applicable, the 
ED numbers concludes the summary. (DB) 
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INTRODUCTION 



About 700 of the 1,000 papers presented at the 1972 AERA Annual Meeting in Chicago, 
Illinois were collected by the ERIC Clearingliouse on Tests, Measurement, and Evaluation 
(ERIC/TM). ERIC/TM indexed and abstracted for announcement in Research in 
Education (RIE) 200 papers which fell within our area of interest -testing, measurement, 
and evaluation. The remaining papers were distributed to the other Clearinghouses in the 
ERIC system for processing. 

Because of an interest in thematic summaries of AERA papers on the part of a large 
segment of ERIC/TM users, we decided to invite a group of authors to assist us in 
producing such a series based on the materials processed for RIE. Four topics were 
chosen tor the series: Criterion Referenced Measurement, Evaluation, Statistics, and 
Test Construction*. 

Most papers referred to in this summary may be obtained in either hard copy or 
microfiche form from: 

ERIC Document Reproduction Service (EDRS) 
P.O. Drawer 0 
Bethesda, Maryland 20014 

Prices and ordering information for these documents may be found in any current 
issue of Research in Education. 
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EDUCATIONAL EVALUATION 

Joan S. Beers 



If the 50 papers on educational evaluation, presented at 
the 1972 American Educational Research Association 
meeting in Chicago, are indicative of the happenmgs in 
the field, clearly something occurred in educational 
evaluation this past year. The most obvious occurrence is 
that the number of papers nearly doubled from the 26 
presented at the 1971 AERA meeting. 

In 1971, evaluation was still going through the defining 
stage. The Phi Delta Kappa National Study Committee on 
Evaluation report, Educational Evaluation and Decision 
Making, was hot off the press and many of the 1971 
presenters directed their attention to a review of the 
report. Some models for implementation were developing, 
but few papers described actual program evaluations. 

This year, the largest number of presenters (18) 
described evaluation models, and as many presenters (14) 



reported on empirical studies as they did on theoretical 
issues. The thinking of the PDK National Study Com- 
mittee still loomed large as did the thinking of the big 
three S*s. Scriven, Stake, and Stufflebeam. 

Considering the number of papers an 1 the space 
allotted for this review, the most this reviewer will 
attempt to do is to bring some order to the many and 
diverse topics included under educational evaluation, cite 
the author(s), and comment briefly on the contents. 
There will be no attempt to pass judgment on the quality 
of the ideas presented. 

The papers fall, or in some cases, were forced, into 
four categories: Theory and Methodology, Empirical 
Models, Empirical Studies and Nationwide Evaluation and 
State Assessment. 



THEORY AND METHODOLOGY 



Fourteen papers were presented in this category. Merri- 
man designates and defines three areas which evaluators 
can serve in the public school system: developing evalua- 
tion and accountability systems, providing process and 
product evaluation of federally funded projects, and 
providing training and consuhative services to faculty, 
parents and students. He cites, also, some basic problem 
areas of evaluation; credibility, threat, lack of personnel, 
and hmited resources. 

Womble labels public school research a **two-faced 
profession.'' Researchers have the responsibility to find 
possible solutions to current educational problems and, at 
the same time, have the responsibility to communicate 
their findings to people other than trained researchers. 
She Hsts recommendations for public school reserrchers to 
be **two-faced" in order to have maximum impact on the 
advancement of education as well as the state of the art. 

Walker examines, in light of an educational setting, the 
five problem areas of educational evaluation identified by 
the PDK National Study Committee on Evaluation: def- 
inition, decision making, values and criteria, levels piob- 



lems, and the research model. He makes suggestions for 
their recv^gnition and avoidance and concludes with a 
series of hypotheses for further investigation, 

Briggs reminds us that the quality of local school 
evaluation is still at a low level and states finrdy that 
increased participation of external evaluators from corpo- 
rations, consulting firms and universities in the evaluation 
efforts of school districts will only prolong a condition 
that needs radical changing. He proposes that new 
infusions of money, a broader definition of evaluation 
and an administrative restructuring of evaluation activities 
can change the system. 

Ashburn, in contrast to Briggs, sees the role of the 
external evaluat^r as necessary to bridge tjtc credibility 
gap between public schools and public and private 
funding agencies. 

Woodbury an«i Jacobson make a posteriori recommen- 
dation for the evaluation of a performance contract. 
Among the recommendations are that the evaluator be 
involved in all planning and decisions from the very 
beginning, that teachers should not administer pretests, 



and that criterion referenced tasks based upon locally 
developed goals and objectives be used to measi re nuj)ils' 
performance. 

Ash burn, in a second paper, mtioduces the idea of 
certification for evaluators. Aft'^r analyzing the pros and 
cons of certification, he concluats that school districts 
would be served best by a certification process that 
involved multiple levels, required periodic updating, wa, 
based on proficiency levels, and allowed school districts 
to participate in i'r.e training. 

Hutchinson expresses the notion that an evaliation 
methodology which does not assure that the data vill be 
used does not accomplish its main purpose: to pi^vide 
data for decision making. He says that for evaluation to 
accomphsh its main purpose, the goals evaluated should 
be the decision maker's, the variables measured shou.d be 
those of concern to the decision makers, the techniques 
used should possess decision maker validity and data 
analysis should be made comprehensible to decision 
makers. 

Jones, expanding the views of Hutchinson, discusses 
further the aspect of ^^co.'^jpleteness" in evaluation. In 
order to attain completeness, the evaluator first must 
elicit the decision maker's entire goal intent. Jones 
describes a methodology, originated by Hutchinson, for 
the evaluator to lead the decision maker to the complete- 
ness of a goal intent. 

Forsyth analyzes Dyer's student change model of an 
educational system-a model in which performance indi- 
cators are computed based upon four groups of vari- 
ables: input, process, and hard-to-change and easy-to- 
change surrounding conditions. Reliability is the focus of 
attention in the paper. 

Fortune identifies six problems peculiar to evaluators 
of social action programs: (1) the presence of value 



conflicts and the absence of procedures through which 
the'>e conflicts can be negotiated; (2) the difficulty in 
identifying important baseline information needs at tlie 
beginning of a developmental program; (3) unreasonable 
pals; (4) the necessity to measure short-term symptoms 
of hypothesized hng-term effects; (5) the lack of knowl- 
edge about the most appropriate time for measuring 
treatment effects, and (6) the inability to analyze the 
resulting complex interactions. 

The ever eloquent Stake asks this question of the 
evaluator of an instructional piogram: "Which is more 
important: to tell of some very special things about the 
progam or to provide the most veridical portrayal of the 
program?" He opts for giving the client a substantive 
portiayal of the program rather than a focus on the more 
pronrdnent features. According to Stake, "if the program 
glows, the evaluation should reflect some of it. If the 
program wobbles, the tremor should pass through the 
evaluation report." 

Tatsuoka disagrees with Stake, as well as with Stuffle- 
beam and with Guba. He argues that randomization, 
experimental design, and generalizability can be applied to 
the septic conditions of the classroom. He then proposes 
a strategy for evaluating nationwide intervention programs 
such as Headstart and Title I. 

Rippey discusses transactional evaluation. "... Trans- 
actional evaluation looks at the effects of changed pro- 
grams on the changers themselves-on the incumbents of 
the roles in the system undergoing the change." Changes 
often involve threats to the roles of incumbents in an 
organization and changing programs require new skills and 
new behaviors. The aims of transactional evaluation are to 
transform the conflict energy associated with change into 
productive activity and to clarify the roles of all persons 
involved in program changes. 



EMPIRICAL MODELS 



Eighteen papers were presented in this category. Klein 
devised a formula to help decision makers compare the 
effectiveness of differing instructional programs. The 
formula is based on the rationale that general program 
effectiveness will increase if one or more of the following 
variables increases: number of objectives, success on the 
objectives, relative importance of the objectives, number of 
students in the program; or if pupil time and/or program 
costs decrease. 

Russell and Leithwood present, in great detail, a model 
to help decision makers base adopt-adapt-reject decisions 
about educational innovations on precise evaluation data. 

Fisher and Ward developed a design for evaluating 
educational programs for culturally disadvantaged children 



based upon the Piaget and Inhelder Taxonomy of Human 
Development. 

Smith presents a three-dimensional model for sum- 
mative evaluation of aesthetic education programs. One 
dimension consists cf six fine art forms; the second 
dimension encompasses pupils' modes of behaving and 
experiencin;^; the tnird dimension is affective involvement. 

Jacobs reports on a four-stage model for program 
development and evaluation at the local school level. The 
fundamental thrust of the model is for more educational 
programming to be initiated at the local school level. 

Roid describes models for course evaluation in colleges 
and universities based upon systems designs. He observes, 
however, that there is little evidence that universities 



reward thorougli evaluation of their courses by professors. 
He concludes that the important task is not the presenta- 
tion of new systems or models but the changing of the 
structure and priorities of the university towards 
accountability. 

Doherty puts evaluation within the framework of 
PPBS, with educational goals forming ihe basis for all 
programs. 

Fraley developed an instructional accomplishment 
index: 

Instructional learning 

Accomplishment Dollars Cost x Learner Time 

to evaluate instructional modules. He details, with daia. 
the use of the index. 

Lasser reports on a model for resource support in the 
development of exportable instructional products. Defined 
and operationalized at the Southwest Regional Labora- 
tory, the model distinguishes several stages of continuing 
development effort and five broad functional areas of 
staff assignment. 

Doyle presents a model for doing transactional evalua- 
tion in program development and an evaluation design for 
the Ford Training and Placement Program for teachers at 
the University of Chicago. 

Hess and Wright list five stages through which curricu- 
lum development projects typically move, identify five 
different audiences for the information acquired by 
evaluative activities, and identify five major dimensions of 
a comprehensive evaluation of curriculum products. 

Light presents specified procedures for evaluating mate- 
rials during their in-content tryout. She concludes that 



systematic formativ^ evaluation is feasible even though 
classical experimental designs are not practical in forma- 
tive evaluation. The systematic elimination of rival hy- 
potheses is one design which appears useful in idenrifying 
inadequacies within an instructional system and in gen- 
erating appropriate revisions. 

Johnson reports a general conceptual model of educa- 
tional research and development incorporating evaluation 
processes used in planning the National Program on Early 
Childhood Education at CEMREL, Inc. This model 
includes both formative and summative evaluation ac- 
tivities. 

Bashook reveals an attempt to bridge the gap between 
theory and practice in teaching. The paper describes an 
exploratory study to devise a method to analyze and 
evaluate concept teaching in university science courses. 

Luft, Lujan and Bemis describe the Quality Assurance 
Model for process evaluation developed by the South- 
western Cooperative Educational Laboratory. The model 
provides administrators with the opportunity to maximize 
desired outcomes of educational programs. 

Abed or describes the development and field testing of 
a flow chart model for formarive evaluation of self- 
instructional mulri-media learning systems. 

Miller reviews the development and application of a 
formative evaluarion model in the design of a mathe- 
matics laboratory for young chDdren. 

Finally, Huberty presents an evaluation system for a 
psychoeducational treatment program for emotionally 
disturbed children. He emphasizes that it is important for 
the evaluarion to be easily implemented and clinically 
useful. 



EMPIRICAL STUDIES 



Fourteen papers are in this category. Cypiess and 
DvBloois describe the application of a formative evalua- 
rion procedure for school staffing models. Three models 
were evaluated as to comprehensiveness, feasibility, and 
viability and classified according to four organizational 
types. The authors conclude that the evaluation tech- 
niques have potential for evaluating the characteristics of 
any school org9nization. 

Otto describes one school district's application of the 
System for Objective Based Evaluation-Reading, devel- 
oped by the Center for the Study of Evaluation at UCLA. 
To date, the district established goals and performance 
objectives for each grade level and began development of 
an assessment system. 

VanMondfrans, Schott, and French compared students' 
achievement in a variety of subject areas under block 



scheduling and traditional scheduling. Overall results in 
subject matter tests favored students in the tracl'tional 
scheduling treatment. There were no significant differ- 
ences between treatments for attitude and interest scores. 

Kievit details an evaluation of a training program to 
prepare teachers to serve as workshop leaders to initiate 
curriculum change. The findings support the feasibility of 
selecting teachers for leadership training who are most 
Hkely to be responsive to efforts to diffuse innovation. 

Rush, McElhinncy, and Junkel evaluate the impact 
resulting from training and engaging public scho6l edu- 
cators as data collectors. Based on data collected through 
participant observations and questionnaires, the investi- 
gators concluded that public school personnel can be 
trained to serve effectively as data collectors ior curricu- 
lum evaluarion. 
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Barry reports the results of an inservice program on 
evaluation for elementary piincipals and curriculum co- 
ordinators. Although the results show that participants 
scored signific'^ntly higher than the control group in 
cognitive skills rnly four of the 24 participants scored 
above 70 pe» vH*, on the cognitive test. The lest Is 
included in the piper. 

Me ching describes the results of an effort to apply a 
classification system to a set of terminal objectives in 
reading. He concludes that reading objectives are no more 
difficult to classify than are objectives in other instruc- 
tional content areas. 

Hartlage reports the results of a comparative evaluation 
of three approaches to initial reading instruction-the 
phonic approach, a look-say approach, and a special 
alphabet approach. The data suggest that for beginning 
first graders whose readiness levels are at or above the 
national average, the special alphabet approach is signifi- 
cantly better for teaching word recognition. 

Ginther conducted transactional evaluation to learn 
more about what teachers emphasize in their work with 
senior medical students in a group clinic. Naturalistic 
observational techniques were used to gather data. 

Dembo and Wilson report on an evaluation of a 
performance contract in reading. Only 13 per cent of the 
2,500 seventh grade pupils reached the objectives of the 
program. The authors make several recommendations to 
school districts who are involved in performance con- 
tracting. 



EUner describes the results of a sunimative evaluation 
of a program to prepare d^y care administrators. As a 
resu'** of- nstruction, 75 pe*^ cent of the objectives were 
achieved. The paper provides a model for curriculum 
development and evaluation of much-needed day care 
training projects. 

The final two papers in this section are ambitious and 
exhaustive studies of Title I programs and evaluation 
practices. Hayman Mazure and Napier report on a survey 
of Title I evaluation practice^ in 20 ib.»n districts 
throughout the ccui try. Five general a'oa of cor.cern are 
identified: planning ana funding, di:jg' and imple- 
mentation, imparting the decision piocess, personnel, and 
state and federal relationships. One resultiL^ recommenda-> 
tion is that at least five per cent cf n'oject resources 
should go into evaluation. 

Brown coniuc:ed a study of the composition and 
disbursement of 16 Title I projects assigiied to 63 schools 
in Philadelphia. Hib design included correlational analysis, 
factor analysis, and content analysis. One of the findings 
suggests that although the needs of disadvantaged pupils 
traditionally have been combined under a general term 
"target population," significantly different subsets of 
pupils and schools exist within a LEA such that general- 
needs, pupil— service needs and achievement patterns of 
each subset represent a distinctly different variety and 
level of resource funds. He suggests that evaluation of 
Title I projects be reconsidered to provide data relevant 
to particular subsets of pupils.. 



NATIONWIDE EVALUATION AND STATE ASSESSMENT 



The final four papers are in this category. Willard speaks 
to the question: "What kind of evaluation needs has the 
Federal government, particularly the USOE?" He proposes 
the nationwide survey as one approach to nationwide 
evaluation and describes the st»"ucture for such a survey as 
developed by the Joint Federal/State Task Force on 
Evaluation. He recommends that the survey approach be 
used only as a means to answer policy issues that call for 
a few simple questions. For policy issues that require 
more complex data, he suggests alternative appro.-'ches, 
such as observation techniques. 

Bickner and Mood highlight some of the problems in 
translating research findings into educational policy at the 
national level. They discuss the divisions of responsibility 
for education, the multiplicity of educational objectives, 
the lack of faith in research findings and the conse- 
quences of rapid change. 

Thorn dike talks about some aspects of the results from 
a study of reading achievement in 15 countries. The 
inability to differentiate the effects of different types and 
quahties of schools on achievement- effects apart from 



pupils' family background-is not only a national problem 
but also an international problem. 

Impara reports on educational assessment in the state 
of Florida. The Florida plan calls for assessment not only 
in the areas of basic skills but also in the aieas of 
communication an ' learning skills, citizenship, occupa- 
tional interests, mental and physical health, home and 
family relationships, esthetic and cultural appreciations 
and human relations. 

A Final Note 

If the purpose of educational evaluation is to p;ovidt 
information for decision making, the ultimate usefulnes, 
of evaluation is yet to be recorded. What kinds of 
decisions are made as a resuk of evaluation? How usefu! 
is evaluation to the decision making process? F educi- 
tional evaluation is to continue to move forward, hope- 
fully the answer to these questions will be one focus of 
tne papers to be presented at the 1973 AERA meetirg. 
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